arrival time
LLMQuery Scheduling with Prefix Reuse and Latency Constraints
The efficient deployment of large language models (LLMs) in online settings requires optimizing inference performance under stringent latency constraints, particularly the time-to-first-token (TTFT) and time-per-output-token (TPOT). This paper focuses on the query scheduling problem for LLM inference with prefix reuse, a technique that leverages shared prefixes across queries to reduce computational overhead. Our work reveals previously unknown limitations of the existing first-come-first-serve (FCFS) and longest-prefix-match (LPM) scheduling strategies with respect to satisfying latency constraints. We present a formal theoretical framework for LLM query scheduling under RadixAttention, a prefix reuse mechanism that stores and reuses intermediate representations in a radix tree structure. Our analysis establishes the NP-hardness of the scheduling problem with prefix reuse under TTFT constraints and proposes a novel scheduling algorithm, k-LPM, which generalizes existing methods by balancing prefix reuse and fairness in query processing. Theoretical guarantees demonstrate that k-LPM achieves improved TTFT performance under realistic traffic patterns captured by a data generative model. Empirical evaluations in a realistic serving setting validates our findings, showing significant reductions in P99 TTFT compared to baseline methods.
Infinite Hidden Semi-Markov Modulated Interaction Point Process
matt zhang, Peng Lin, Peng Lin, Ting Guo, Yang Wang, Yang Wang, Fang Chen
The correlation between events is ubiquitous and important for temporal events modelling. In many cases, the correlation exists between not only events' emitted observations, but also their arrival times. State space models (e.g., hidden Markov model) and stochastic interaction point process models (e.g., Hawkes process) have been studied extensively yet separately for the two types of correlations in the past. In this paper, we propose a Bayesian nonparametric approach that considers both types of correlations via unifying and generalizing the hidden semiMarkov model and interaction point process model. The proposed approach can simultaneously model both the observations and arrival times of temporal events, and automatically determine the number of latent states from data.
Infinite Hidden Semi-Markov Modulated Interaction Point Process
The correlation between events is ubiquitous and important for temporal events modelling. In many cases, the correlation exists between not only events' emitted observations, but also their arrival times. State space models (e.g., hidden Markov model) and stochastic interaction point process models (e.g., Hawkes process) have been studied extensively yet separately for the two types of correlations in the past. In this paper, we propose a Bayesian nonparametric approach that considers both types of correlations via unifying and generalizing hidden semi-Markov model and interaction point process model. The proposed approach can simultaneously model both the observations and arrival times of temporal events, and determine the number of latent states from data.
Less is More: Non-uniform Road Segments are Efficient for Bus Arrival Prediction
Huang, Zhen, Deng, Jiaxin, Xu, Jiayu, Pang, Junbiao, Yu, Haitao
Abstract--In bus arrival time prediction, the process of organizing road infrastructure network data into homogeneous entities is known as segmentation. Segmenting a road network is widely recognized as the first and most critical step in developing an arrival time prediction system, particularly for auto-regressive-based approaches. Traditional methods typically employ a uniform segmentation strategy, which fails to account for varying physical constraints along roads, such as road conditions, intersections, and points of interest, thereby limiting prediction efficiency. In this paper, we propose a Reinforcement Learning (RL)-based approach to efficiently and adaptively learn non-uniform road segments for arrival time prediction. Our method decouples the prediction process into two stages: 1) Nonuniform road segments are extracted based on their impact scores using the proposed RL framework; and 2) A linear prediction model is applied to the selected segments to make predictions. This method ensures optimal segment selection while maintaining computational efficiency, offering a significant improvement over traditional uniform approaches. Furthermore, our experimental results suggest that the linear approach can even achieve better performance than more complex methods. Extensive experiments demonstrate the superiority of the proposed method, which not only enhances efficiency but also improves learning performance on large-scale benchmarks.
Infinite Hidden Semi-Markov Modulated Interaction Point Process
The correlation between events is ubiquitous and important for temporal events modelling. In many cases, the correlation exists between not only events' emitted observations, but also their arrival times. State space models (e.g., hidden Markov model) and stochastic interaction point process models (e.g., Hawkes process) have been studied extensively yet separately for the two types of correlations in the past. In this paper, we propose a Bayesian nonparametric approach that considers both types of correlations via unifying and generalizing hidden semi-Markov model and interaction point process model. The proposed approach can simultaneously model both the observations and arrival times of temporal events, and determine the number of latent states from data.
The Curse of Shared Knowledge: Recursive Belief Reasoning in a Coordination Game with Imperfect Information
Bolander, Thomas, Engelhardt, Robin, Nicolet, Thomas S.
Common knowledge is crucial for safe group coordination. In its absence, humans must rely on shared knowledge, which is inherently limited in depth and therefore prone to coordination failures, because any finite-order knowledge attribution allows for an even higher order attribution that may change what is known by whom. In three separate experiments involving 802 participants, we investigate the extent to which humans can differentiate between common knowledge and nth-order shared knowledge. We designed a two-person coordination game with imperfect information to simplify the recursive game structure and higher-order uncertainties into a relatable everyday scenario. In this game, coordination for the highest payoff requires a specific fact to be common knowledge between players. However, this fact cannot become common knowledge in the game. The fact can at most be nth-order shared knowledge for some n. Our findings reveal that even at quite shallow depths of shared knowledge (low values of n), players behave as though they possess common knowledge, and claim similar levels of certainty in their actions, despite incurring significant penalties when falsely assuming guaranteed coordination. We term this phenomenon 'the curse of shared knowledge'. It arises either from the players' inability to distinguish between higher-order shared knowledge and common knowledge, or from their implicit assumption that their co-player cannot make this distinction.